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Abstract. 

Remote  sensing  of  chemical  vapour  plumes  is  a  difficult  but  important  task  with  many  military  and  civilian  appli¬ 
cations.  Hyperspectral  sensors  operating  in  the  long  wave  infrared  (LWIR)  regime  have  well  demonstrated  detection 
capabilities.  However,  the  identification  of  a  plume’s  chemical  constituents,  based  on  a  chemical  library,  is  a  multiple 
hypothesis  testing  problem  which  standard  detection  metrics  do  not  fully  describe.  We  propose  using  an  additional 
performance  metric  for  identification  based  on  the  so-called  Dice  index.  Our  approach  partitions  and  weights  a  con¬ 
fusion  matrix  to  develop  both  the  standard  detection  metrics  and  identification  metric.  Using  the  proposed  metrics, 
we  demonstrate  that  the  intuitive  system  design  of  a  detector  bank  followed  by  an  identifier  is  indeed  justified  when 
incorporating  performance  information  beyond  the  standard  detection  metrics. 
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1  Introduction 

Passive  hyperspectral  sensors  operating  in  the  longwave  infrared  (LWIR)  provide  high  resolution 
measurements  in  a  region  of  the  electromagnetic  spectrum  where  many  chemicals  have  unique 
absorption  profiles.  The  high  spectral  and  spatial  resolution  of  these  sensors  allows  for  the  iden¬ 
tification  of  the  individual  chemicals  within  a  gaseous  plume.1  A  library  of  chemical  absorption 
signatures  is  often  all  the  prior  knowledge  available,  and  it  is  the  job  of  the  signal  processing 
algorithms  to  decide  if  chemicals  are  present,  and  also  which  ones;  the  process  of  finding  which 
chemicals  are  present  in  the  plume  is  known  as  identification.  A  simple  but  effective  system  design 
consists  of  separate  detection  and  identification  algorithms,  as  shown  in  Fig.  1.  In  order  to  assess 
the  performance  of  such  a  system,  and  to  compare  performance  of  different  algorithms,  it  is  nec¬ 
essary  to  define  metrics  that  address  both  the  detection  and  identification  tasks.  In  this  paper,  we 
propose  such  a  performance  evaluation  methodology  based  on  confusion  matrices.  Furthermore, 
we  employ  this  methodology  to  demonstrate  that  the  design  of  a  state-of-the-art  detection  algo¬ 
rithm  followed  by  an  identification  algorithm  is  superior  to  that  of  either  algorithm  individually. 

The  difference  between  detection  and  identification  is  somewhat  subtle.  We  categorize  prob¬ 
lems  depending  on  the  number  of  chemicals  in  the  library  and  whether  mixtures  of  chemicals  can 
be  present  or  not.  These  distinctions  can  be  summarized  as 
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Fig  1 :  Practical  chemical  detection-identification  system.  The  identifier  has  one  output  for  each 
chemical  in  a  known  library. 

1.  Looking  for  a  specific  gas; 

2.  Looking  for  a  single  gas  from  a  library  of  L  gases; 

3.  Looking  for  mixtures  of  up  to  m  gases  from  a  library  of  L  gases. 

The  first  case  is  a  detection  problem  where  the  library  contains  a  single  gas  whose  presence  or 
absence  needs  to  be  decided;  detection  is  inherently  a  binary  hypothesis  problem.  When  the  library 
contains  L  chemicals  and  we  are  trying  to  determine  which  one  is  present,  we  no  longer  have  a  pure 
detection  problem  as  there  are  multiple  hypotheses  to  choose  from.  In  Case  2,  not  only  do  we  have 
the  presence  or  absence  of  the  plume  to  decide,  but  also  which  single  chemical  is  actually  present. 
In  Case  3,  instead  of  only  picking  a  single  chemical,  a  mixture  of  chemicals  can  be  present. 

Chemical  identification  can  be  formulated  as  a  multiple  hypothesis  testing  problem  where  each 
hypothesis  represents  a  subset  of  the  library  gases,  including  the  empty  set.  Each  pixel  in  the  scene 
has  a  true  and  an  output  hypothesis  or  class,  which  may  differ.  A  natural  way  to  represent  the 
performance  of  such  a  system  is  through  a  confusion  matrix,  or  error  matrix,  where  each  pair  of 
truth  and  output  is  represented  by  a  single  entry  of  the  matrix.  Each  entry  of  the  matrix  contains 
tallies  of  the  number  of  pixels  with  the  corresponding  truth  and  output.  A  particular  dataset  and 
threshold  produces  a  single  realization  of  the  confusion  matrix,  which  can  then  be  summarized 
in  several  useful  performance  metrics.  For  detection  problems  a  single  threshold  determines  the 
operating  point  and  the  confusion  matrix  can  be  used  to  estimate  probability  of  detection  (PD) 
and  probability  of  false  alarm  (PFA).  Sweeping  a  range  of  thresholds  leads  to  a  plot  of  PD  versus 
PFA,  called  a  receiver  operating  characteristic  (ROC)  curve.  A  ROC  curve  fully  characterizes 
performance  for  detection  problems.2  In  cases  2  and  3,  the  confusion  matrix  becomes  larger  and 
more  difficult  to  interpret  and  in  general  may  not  be  governed  by  a  single  threshold.  When  multiple 
thresholds  are  used,  the  construction  of  a  ROC  surface  which  characterizes  performance  is  possible, 
but  is  not  easily  visualized  and  is  still  difficult  to  interpret.3  The  algorithms  we  consider  use  only  a 
single  threshold,  considerably  simplifying  analysis.  Evaluating  the  system  for  a  range  of  thresholds 
then  produces  a  performance  curve  for  each  metric  used,  from  which  general  performance  trends 
can  be  assessed. 

The  appropriate  metrics  to  use  in  summarizing  a  confusion  matrix  depend  on  the  particular 
application.  In  hyperspectral  chemical  detection  and  identification  problems,  the  number  of  back¬ 
ground  (no  gas)  pixels  is  far  larger  than  the  number  of  plume  pixels,  making  the  false  alarm  rate 
one  of  the  most  important  metrics  of  performance  when  operating  a  real  system.  Therefore,  our 
approach  partitions  the  confusion  matrix  into  two  parts:  one  involving  the  false  alarm  rate,  and  the 
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other  involving  the  pixels  that  contain  plume.  For  the  portion  of  the  confusion  matrix  containing 
plume  pixels,  we  utilize  two  different  metrics,  the  correct  detection  rate  and  the  Dice  index.  The 
correct  detection  rate  is  the  fraction  of  pixels  for  which  at  least  one  chemical  is  correctly  detected. 
The  identification  metric  we  choose  is  known  as  the  Dice  index  and  is  based  on  the  amount  of 
agreement  between  the  truth  and  output  hypotheses.4,5  Unlike  the  correct  detection  rate,  the  Dice 
index  considers  both  the  number  of  correct  chemicals  in  the  output  and  the  number  of  incorrect 
chemicals.  Since  identification  deals  with  mixtures  of  chemicals  and  the  correct  detection  rate  does 
not  incorporate  mixtures,  the  Dice  index  is  the  metric  we  choose  for  evaluating  identification  per¬ 
formance.  Ultimately,  a  good  system  should  have:  a  low  false  alarm  rate,  a  high  correct  detection 
rate,  and  high  identification  performance  as  measured  by  the  Dice  index. 

The  system  we  evaluate  is  a  detector  bank  followed  by  an  identifier.  The  detector  bank  is 
composed  of  adaptive  coherence  (cosine)  estimator  (ACE)  detectors,6  while  the  identifier  is  the 
Bayesian  model  averaging  (BMA)  approach.7,8  Intuitively,  using  a  bank  of  detectors  as  an  identi¬ 
fier  should  perform  worse  as  an  identifier  than  an  algorithm  designed  for  identification;  similarly, 
an  identifier  should  perform  worse  at  detection  than  a  detector.  We  use  the  proposed  identifica¬ 
tion  metric  and  standard  detection  metrics  to  demonstrate  that  the  ACE  detector  bank  has  better 
detection  performance  than  BMA,  but  also  has  lower  identification  performance  than  BMA.  These 
results  suggest  using  the  detector  bank  followed  by  the  identifier  for  improved  performance  over 
either  individually,  which  we  demonstrate  to  be  the  case. 

For  single  gas  problems,  algorithms  can  generally  be  categorized  as  classical  detection  algo¬ 
rithms  or  regression  based  algorithms.  Classical  detection  algorithms  include  the  matched  filter, 
matched  filter  variants,  the  spectral  angle  measure  (SAM),  and  the  the  adaptive  cosine  estimator 
(ACE).2  Regression  based  algorithms  fit  the  data  to  a  regression  model  and  then  use  either  a  sig¬ 
nificance  test  or  a  threshold  on  the  model  coefficients  to  determine  whether  or  not  a  particular  gas 
is  present.9  We  choose  the  ACE  algorithm  as  a  detector  since  it  is  a  very  effective  and  popular 
detection  algorithm  for  hyperspectral  imagery. 

Identifiers  are  designed  for  when  mixtures  of  gases  are  permissible,  and  the  constituents  un¬ 
known.  Several  identification  techniques  have  been  proposed  including  using  a  bank  of  detec¬ 
tors,6  using  linear  regression  models  with  significance  testing,10,11  using  step-wise  regression,12 
and  Bayesian  techniques.7,13,14  Compared  to  a  detector,  identifiers  make  a  decision  for  each  chem¬ 
ical  in  the  library,  whereas  detectors  only  consider  a  single  gas.  BMA  was  selected  for  the  identifier 
because  it  is  considered  state-of-the-art  for  identification. 

To  understand  how  performance  is  affected  by  the  parameters  of  the  plume,  we  used  a  plume 
embedding  procedure  to  produce  synthetic  plume  data  that  preserves  the  variability  of  the  back¬ 
ground  data.  Plume  thickness  has  a  non-linear  relationship  with  the  measured  signal,  which  can 
be  exploited  by  non-linear  fitting  algorithms,  but  causes  additional  fitting  error  in  linear  techniques 
like  the  ones  we  consider.15  Both  algorithms  were  tested  individually  on  embedded  data  for  a  range 
of  thickness  parameters.  The  cascaded  system  was  then  tested  for  the  same  parameter  ranges. 

The  key  results  of  our  study  show  that  both  the  detector  and  the  identifier  are  needed  for  overall 
good  performance,  a  result  that  has  not  been  demonstrated  in  the  literature  before.  To  our  knowl¬ 
edge,  using  a  confusion  matrix  to  develop  a  series  of  performance  metrics,  and  the  use  of  the  Dice 
index  as  a  performance  measure  have  not  been  done  in  this  field.  Our  approach  provides  a  starting 
place  for  comparative  analysis  of  other  system  designs. 

The  remainder  of  this  paper  is  organized  as  follows.  Section  2  presents  background  material 
on  the  phenomenology  of  the  data  and  explains  the  simplifications  used  in  deriving  useful  models. 
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In  Section  3,  we  discuss  confusion  matrices  and  the  proposed  identification  performance  metric. 
In  Section  4,  the  two  identification  algorithms  are  defined,  and  the  key  formulas  are  presented.  In 
Sections  5.1  and  5.2,  we  compare  the  detection  and  identification  performance  of  the  two  identi¬ 
fication  algorithms  individually.  The  effect  of  plume  thickness  on  identification  performance  for 
each  algorithm  is  explored  in  Section  5.3.  Performance  of  the  cascaded  system  with  respect  to 
plume  thickness  is  examined  in  Section  5.4.  Finally,  in  Section  6,  we  provide  a  short  summary  of 
the  paper  and  discuss  future  work. 

2  At-Sensor  Radiance  Signal  Model 

A  simple  but  useful  model  for  the  at-sensor  radiance  in  the  longwave  infrared  (LWIR)  can  be 
developed  from  a  full  radiative  transfer  model  with  a  few  key  assumptions: 

1.  The  plume  is  optically  thin  and  the  distance  between  the  plume  and  background  are  small 
enough  to  neglect  the  atmospheric  transmittance  in  that  region; 

2.  The  plume  is  homogeneous  in  temperature  and  composition; 

3.  Scattering  and  reflections  can  be  neglected. 

These  simplifications  allow  the  use  of  the  three  layer  model  of  Fig.  2  which  can  be  used  to 
derive  our  primary  measurement  equations  using  Kirchoff’s  law.16  From  Fig.  2  the  measured 
radiance  for  an  off-plume  pixel  is 


£0ff(A)  =  (1  -  Ta{X))B(X,  Ta)  +  ra(A)Lb(A) 

where  A  is  in  wavelengths  or  wavenumbers,  Ta  is  the  temperature  of  the  atmosphere,  ra  is  the  trans¬ 
mittance  of  the  atmosphere,  Lb  is  the  background  radiance,  and  B( A,  T)  is  the  Planck  function, 
which  describes  a  black  body  at  temperature  T. 


Atmosphere  I  Plume  I  Background 


Layer  1  '  Layer  2  I  Layer  3 


Fig  2:  Simplified  3  layer  radiance  model  for  thin  plumes. 


The  measured  radiance  for  a  pixel  with  plume  becomes 

Lon(A)  =  (1  -  Ta(X))B(X, Ta)  +  r0(A)(l  -  rp(X))B(X,  Tp)  +  ra(A)rp(A)Lb(A). 

where  tp  is  the  transmittance  of  the  plume.  In  terms  of  Loff,  we  instead  have 

Lon(A)  =  (1  -  rp(X))ra(X)(B(Tp,  A)  -  Lb(A))  +  Loff(A). 


(1) 
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The  model  in  Eq.  1  is  useful  for  analysis  and  gives  a  clear  method  for  generating  synthetic  data 
under  certain  circumstances;  namely,  that  the  plume  and  atmosphere  are  in  equilibrium. 

The  transmittance  of  the  plume  tp  is  governed  by  Beer’s  law:16 

(2) 

where  m  is  the  number  of  gases  in  the  plume,  cq  is  the  concentration  path  length  (CL)  for  gas  i  and 
Si  is  the  gas’  absorption  spectrum.  When  the  thermal  contrast  AT  =TP—Tb  between  the  plume  and 
background  is  small,  and  the  background  radiance  is  slowly  varying  with  respect  to  wavelength, 
the  difference  B(TP,  A)  —  Lb(A)  is  approximately  proportional  to  Ar.  Using  the  approximation 
(1  —  ex)  ~  x,  we  obtain 


tp(  A)  =expl-J2 


2=1 


m 

x(A)  «  Ar  To( A)  ^2  aiSiW  +  ^off(A) 

2 


(3) 


which  is  linear  in  terms  of  the  gas  signatures.  Assuming  the  atmospheric  transmission  ra(A)  is 
known,  the  signatures  are  multiplied  by  the  atmospheric  transmission  ra  before  further  processing 
is  done.  Ultimately,  the  temperature  and  CL  both  need  to  be  estimated,  but  linear  techniques  esti¬ 
mate  the  product  of  the  two.  The  problem  of  estimating  the  individual  temperature  and  emissivity 
quantities  is  known  as  temperature-emissivity  separation.  For  our  purposes  we  estimate  ATaj  as  a 
single  quantity  6*. 

The  input  signal  is  convolved  with  the  sensor  response  function  and  sampled  at  a  set  of  band 
centers  [Ai, . . . ,  Ap]  to  produce  a  measurement  vector  x  =  [x  i , . . . ,  ay,] 1  where  p  is  the  number  of 
sensor  channels.  The  atmospheric  transmission  ra( A)  is  applied  to  library  signatures  s;(A)  and  the 
product  sampled  to  the  sensor’s  resolution  using  the  sensor’s  spectral  response  to  obtain  sampled 
signatures  st.  Organizing  the  signatures  as  a  matrix  S  and  the  6,’s  as  a  vector  b,  we  have 

m 

x  =  ^2  sibi  +  v  =  Sb  +  v,  v  ~  Af  (mb,  Cb)  (4) 

2=1 

where  mb  is  the  background  clutter  mean  and  Cb  is  the  clutter  covariance.  The  assumption  in  Eq.  4 
is  that  the  clutter  v  is  well  modeled  by  a  multivariate  Gaussian  distribution,  which  may  not  hold  in 
reality,  but  is  a  useful  model  for  many  practical  algorithms.  Defining  the  whitening  matrix  1  /  2 
and  whitened  vectors 

x  =  C^1/2(x  -  mb),  S  =  C'2/2S,  v  =  C^1/2(v  -  mb)  (5) 

yields  the  standard  regression  model 

m 

x  =  ^2  +  v  =  Sb  +  v,  u~A/’(0,J).  (6) 

2=1 

where  the  clutter  is  zero  mean  and  has  identity  covariance.  The  linear  model  of  Eq.  6  is  useful 
for  developing  the  identification  algorithms  we  consider,  but  is  a  good  approximation  only  for  thin 
plumes. 
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Fig  3:  Algorithm  independent  way  for  confusion  matrix  construction. 

3  Plume  Identification  Performance  Metrics 

The  framework  we  present  for  evaluating  identifiers  relies  on  using  the  confusion  matrix  for  multi¬ 
class  problems.  For  a  single  dataset,  the  confusion  matrix  has  all  the  performance  information 
available  in  detail.  However,  for  ease  of  interpretation,  the  confusion  matrix  may  be  summarized 
using  several  scalar  performance  metrics  that  are  appropriate  for  detection  and  identification. 

Assume  that  for  each  pixel  we  have  the  identifier  output  g  and  the  “ground”  truth  t.  These  are 
binary  vectors  given  by  g  =  [gi,  g2,  ■  ■  ■ ,  gi]  T  and  t  =  [i1;  t2,  •  •  • ,  (l]  T  where  gk  is  an  indicator 
for  the  £;th  gas  being  present  in  the  identifier  output,  and  tk  is  an  indicator  for  whether  the  kth  gas 
is  truly  present  in  the  pixel,  i.e. 


{1,  Gas  k  identified  in  pixel.  [  1,  Gas  k  is  present  in  pixel. 

t  /.  ;  s 

0,  Gas  k  not  identified  in  pixel.  I  0,  Gas  k  is  absent  from  pixel. 

The  binary  vectors  g  and  t  have  M  unique  configurations  depending  on  the  maximum  number  of 
gases  allowed  in  the  output.  The  allowed  configurations  are  denoted  gt  and  tj  with  i,j  E  [1 , . . .  M ] . 
Each  possible  truth  vector  tj  is  assigned  to  a  column  of  the  confusion  matrix  (CM),  while  every 
possible  identifier  output  is  assigned  to  a  row  of  the  confusion  matrix;  each  cell  of  the  CM  corre¬ 
sponds  to  a  particular  pair  of  truth  and  output  vectors,  and  each  pixel  is  assigned  to  a  particular  cell 
based  on  the  vectors  associated  with  it.  In  summary,  the  CM  contains  in  element  ( i,j )  a  tally  of 
the  number  of  pixels  with  output  gi  and  truth  tj.  Operationally,  the  CM  is  constructed  by  tallying 
each  pixel  in  the  correct  entry  of  the  CM  based  on  the  identifier  output  and  truth,  as  illustrated  in 
Fig.  3. 

The  confusion  matrix  varies  in  size  depending  on  both  the  size  of  the  library  and  whether 
mixtures  are  allowed.  In  terms  of  hypothesis  testing  or  classification,  each  hypothesis  Hk  has  a 
corresponding  binary  indicator  vector  g  or  t  depending  on  which  library  chemicals  are  present  and 
on  whether  the  hypothesis  is  the  true  one  or  the  output  from  the  system.  When  looking  for  only  a 
single  gas,  the  confusion  matrix  is  only  2x2  as  in  Fig.  4a  and  a  single  threshold  controls  whether  a 
pixel  is  assigned  to  the  null-hypothesis  ( H0 )  or  the  gas  present  hypothesis  (Hi).  When  looking  for 
one  gas  out  of  a  library  of  size  L,  the  CM  is  size  [L  + 1]  x  [L  + 1]  with  the  possibility  of  both  correct 
identifications  and  incorrect  identifications,  as  in  Fig.  4b.  Incorrect  identification  occur  when  one 
chemical  is  mistaken  for  another  or  when  there  is  no  overlap  between  the  chemicals  in  the  truth  and 
output.  In  Fig.  4b,  the  hypotheses  Hi  and  H2  represent  each  chemical  from  a  library  of  size  two; 
the  corresponding  binary  vectors  are  g,t  =  [l  0]  or  [0  l] .  Hypothesis  H3  represents  the  presence 
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Fig  4:  Different  confusion  matrices  depending  on  the  particular  problem,  (a)  Detection  type  prob¬ 
lem.  (b)  1  of  L  gases  with  library  of  size  2.  (c)  Up  to  two  gases  with  library  of  size  2.  H0  is  the 
null-hypothesis;  Hi  and  H2  contain  gas  1  and  2  respectively,  while  H3  has  both. 


of  both  chemicals  in  the  plume.  When  looking  for  up  to  m  of  L  the  CM  is  of  size  (%) ,  but 

in  general  when  any  mixture  of  chemicals  is  allowed  the  CM  is  size  2L.  In  Fig.  4c,  the  full  CM 
for  a  library  of  size  two  is  shown;  the  hypothesis  H3  represents  the  mixture  of  both  chemicals; 
when  there  is  some  overlap  between  the  chemicals  that  are  detected  and  the  chemicals  actually 
present  we  have  a  partially  correct  identification.  In  general,  since  the  CM,  and  the  number  of 
hypotheses  to  test,  grows  exponentially  with  the  size  of  the  library,  it  is  impractical  to  fill  out  the 
full  confusion  matrix  or  to  test  all  possible  models.  For  even  moderately  sized  threat  libraries  the 
confusion  matrix  becomes  difficult  to  interpret  because  of  its  size  necessitating  summarization. 

The  CM  can  be  summarized  by  partitioning,  weighting  and  then  averaging.  For  plume  identifi¬ 
cation  applications  the  CM  can  be  partitioned  into  several  sub-matrices  that  contain:  false  alarms, 
misses,  correct  IDs,  and  incorrect  IDs  as  shown  in  Fig.  4.  Broadly,  the  false  alarm  section  is  the 
column  where  no  gases  are  present  (tj  =  0)  and  the  other  cases  occur  when  at  least  one  gas  is 
present.  Performance  metrics  can  be  calculated  using  portions  of  the  CM  as  follows:  choose  a 
partition  the  CM;  sum  the  elements  of  the  partition;  apply  a  weight  matrix  W  with  weights  wh]  to 
the  CM;  sum  the  weighted  elements  of  the  partition;  take  the  ratio  of  the  weighted  and  unweighted 
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p 

Whj 

Name(s) 

Description 

0 

(di'tj) 

l*il 

Sensitivity 

Recall 

Fraction  of  gases  in  pixel  that  are  detected. 

1/2 

\9i\  +  \tj\ 

Dice  index 

F-metric 

Incorporates  both  number  of  gases  identified 
and  number  actually  present. 

Harmonic  mean  of  precision  and  recall. 

1 

(di'tj) 

\9i\ 

Precision 

Fraction  of  gases  detected  that  are  present  in  the  pixel. 

Table  1:  Weights  derived  from  Eq.  8  using  different  values  of  (3. 


sums.  These  operations  can  be  written  succinctly  as 

EltWCMOWly 


^perf  — 


EW)£,[CM]y 


0  ^  dperf  ^  1 1 


(7) 


where  ©  denotes  element-wise  multiplication,  and  the  set  S  represents  the  sub-matrix  to  sum  over. 
The  brackets  [.]  with  subscripts  indicate  a  single  element  of  the  matrix  within  the  brackets. 

Though  there  are  a  substantial  number  of  different  metrics  that  can  be  derived  from  the  CM 
depending  on  the  weights  used,  the  metric  we  use  for  identification  performance  comes  from  the 
family  of  indexes  defined  by 


f3\gi\  +  (1  —  0)\tj 


(8) 


with  (3  G  [0,1].  The  numerator  in  Eq.  8  indicates  the  number  of  agreements  between  the  output  and 
truth,  while  the  denominator  has  the  sum  of  the  number  of  chemicals  in  the  truth  and  the  number 
in  the  output.  The  resulting  weights  incorporate  the  truth  and  output  vectors  to  varying  degrees 
depending  on  the  value  of  j3  used.  Setting  f3  =  1/2  in  Eq.  8  we  obtain  the  Dice  index 


wu  = 


2(ffj  '  tj) 
\9i\  +  \tj\ 


(9) 


or  F-metric,  not  to  be  confused  with  the  F-test  from  linear  regression.17  We  choose  the  Dice  index 
for  identification  because  it  incorporates  both  the  number  of  gases  in  the  identifier’s  output,  and 
the  number  of  gases  in  the  pixel.  Other  choices  of  (3  lead  to  metrics  that  weigh  the  importance  of 
g  and  t  in  different  proportions.  For  example,  when  3  =  0  the  number  of  incorrect  outputs  is  not 
taken  into  account.  Several  common  weights  derived  from  Eq.  8  that  are  used  in  the  literature  are 
presented  in  Table  l.18 

The  three  metrics  we  use  are  the  false  alarm  rate,  the  correct  detection  rate,  and  identification 
performance  as  measured  using  the  Dice  index  of  Eq.  9.  These  detection  and  identification  metrics 
are  listed  in  Table  2,  along  with  the  partitions  of  the  CM  used  in  the  weighting  and  summarization 
process.  The  false  alarm  rate  is  very  important  in  both  detection  and  identification  systems  since 
it  determines  how  much  background  data  will  have  to  undergo  additional  scrutiny.  In  standoff 
systems,  the  vast  majority  of  data  does  not  contain  plume  and  is  only  background  data;  having  a 
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low  false  alarm  rate  and  having  the  other  metrics  as  high  as  possible  are  desirable  system  charac¬ 
teristics. 

The  methodology  we  have  taken  for  evaluating  performance  does  not  depend  on  any  partic¬ 
ular  identification  system  architecture,  except  that  the  system  must  produce  a  list  of  identified 
gases.  The  internal  workings  of  an  algorithm  are  not  taken  into  account  when  using  this  approach. 
However,  the  algorithms  we  analyze  in  this  paper  use  a  single  threshold  to  make  decisions.  At 
each  threshold  we  obtain  a  single  realization  of  the  CM  from  which  each  performance  metric  is 
calculated. 


4  Plume  Identification  Techniques 

The  two  algorithms  we  compare  are  a  detector  bank  approach  and  a  model  averaging  algorithm. 
The  detector  bank  has  a  set  of  single-gas  (single-chemical)  detectors,  one  for  each  library  signa¬ 
ture.  Each  detector  produces  a  score  for  a  single  gas,  which  is  then  thresholded  to  make  a  decision 
about  each  gas.  Similarly,  model  averaging  produces  a  score  for  each  gas  which  is  then  thresh¬ 
olded  to  make  a  decision.  Of  the  popular  algorithms,  the  operational  similarity  of  these  two  makes 
comparative  analysis  simpler,  and  easy  to  interpret.  In  this  section  we  give  overviews  of  the  adap¬ 
tive  coherence  estimator  (ACE)  detector  bank  and  Bayesian  model  averaging  (BMA)  identification 
techniques,  and  provide  the  relevant  formulas  for  each. 

4.1  A  Detector  Bank  for  Identification 

Detection  algorithms  are  designed  to  solve  binary  hypothesis  problems  with  a  known  target  sig¬ 
nal.  Perhaps  the  most  common  and  well  known  algorithm  is  the  matched  filter.2  The  normalized 
matched  filter  (NMF)  is  a  simple  modification  where  the  matched  filter  is  normalized  by  the  mea¬ 
surement  length.  The  matched  filter  and  NMF  can  take  both  positive  and  negative  values  depending 
on  the  orientation  of  the  input  signal  relative  to  the  signature.  In  the  LWIR,  the  relative  direction  of 
the  input  and  signature  may  depend  on  the  thermal  contrast  Ar  from  Eq.  3,  which  may  be  positive 
or  negative.  To  create  a  sign-insensitive  detector  the  NMF  can  be  squared  to  obtain  the  adaptive 
coherence  estimator  (ACE).  We  define  ACE  for  the  kth  library  signature  as 

Ilk  =  .^2  ^  .  .2  =  COS2 (4)-  (10) 

*  \\Sk\\ 
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where  Op.  is  the  angle  between  the  whitened  signature  sk  and  the  whitened  pixel  x.  We  use  the 
term  ACE  for  the  squared  detector,  though  various  terms  are  used  in  the  literature.19 

To  use  ACE  as  an  identifier,  a  bank  of  detectors  can  be  constructed,  where  each  detector  is 
tuned  to  a  particular  library  signature.  For  detection  of  a  specific  gas,  only  one  ACE  detector  is 
needed;  for  detecting  one  of  L  gases,  a  bank  of  L  detectors  can  be  used  and  the  maximum  taken; 
for  the  m  of  L  problem,  instead  of  taking  the  maximum,  the  outputs  can  be  thresholded  to  obtain 
a  list  of  gases. 

When  only  the  maximum  of  the  detector  bank  is  considered,  mixtures  are  excluded  from  con¬ 
sideration,  which  can  be  problematic  when  mixtures  are  present  in  the  data.  Picking  the  maximum 
may  be  appropriate  in  applications  where  only  a  single  target  is  allowed  in  any  pixel;  for  exam¬ 
ple,  in  the  reflective  regions  of  the  electromagnetic  spectrum,  the  ground  resolution  can  be  small 
enough  that  only  a  single  target  can  be  in  any  particular  pixel.20  For  gaseous  plumes,  we  use  the 
thresholding  approach  instead  of  taking  a  maximum  and  then  thresholding. 

4.2  Bayesian  Model  Averaging 

Bayesian  model  averaging  (BMA)  is  a  technique  for  estimating  parameters  using  the  construction 
of  a  set  of  models  that  are  fitted  to  the  data.  In  our  case,  each  model  is  a  linear  model  for  x  using 
a  unique  subset  of  gases  from  the  library  S.  Specifically,  model  j  is  in  the  form  of  Eq.  6  but  with 
a  particular  library  subset  in  Sj  and  the  estimate  of  CLx  At  in  bj.  Each  model  Mj  is  defined  as 

Mj  :  x  =  Sjbj  +  v  (11) 

where  the  index  j  refers  to  the  model,  and  is  a  separate  index  from  other  sections.  Defining  Ak  as 
the  event  that  gas  k  is  present,  BMA  computes  the  probability  of  the  event  Ak  as  the  average  over 
all  models 


pk  =  Pr{Afc|£}  =  {Ak\Mj,x}Pr  {Mj\x}  (12) 

3 


where  Mj  is  the  yth  model  being  considered.  The  probability  Pr  {Ap.\M:].  x  }  is  an  indicator  of 
whether  or  not  gas  k  is  in  the  model.  The  model  probabilities  can  be  calculated  using  Bayes’  rule 


?r{Mj\x} 


Pr{x\Mj}  Pr  {Mj } 
Pr  {x\ M, }  Pr  {M, } 


(13) 


where  Pr  {x\Mj}  is  the  likelihood  of  the  data  given  the  model,  and  Pr  (Mj)  are  the  prior  probabil¬ 
ities  of  the  models.  Typically,  the  likelihood  of  the  data  depends  on  model  parameters  that  make  it 
difficult  to  find  expressions  for  the  likelihood.  Instead,  the  likelihood  can  be  approximated  using 
the  model’s  Bayesian  information  criterion  (BIC)  as 


Pr{x\Mj}  ~  exp  {— BICj/2}  . 


For  linear  regression  models,  as  in  Eq.  1 1,  the  BIC  is 


BICj  =  nln(RSSj/n)  +  dj  ln(n) 
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where  n  is  the  number  of  spectral  bands,  and  dj  is  the  number  of  gases  in  model  j.  The  first  term 
in  the  BIC  depends  on  how  well  the  model  fits  the  data,  and  the  second  term  is  essentially  a  model 
complexity  penalty.  The  residual  sum  of  squares  (RSS)  is  defined  as 

RSSj  =  xT(I-  Pg.)x  =  xTP^x 


where  Pg  =  Sj(SjT  Sj)-1  SjT .  The  additional  penalty  on  the  model  complexity  in  the  BIC  leads 
to  smaller  models  being  more  likely.  Only  considering  models  up  to  a  certain  size  (mixtures  of  m) 
further  favors  smaller  models. 

Finally,  assuming  all  models  are  equally  likely,  and  that  the  models  are  exhaustive,  the  model 
probabilities  must  sum  to  one,  and  Eq.  13  becomes 


Vx{M3\x} 


exp  {— BICj/2} 
E,exp{— BIQ/2}' 


(14) 


Eq.  14  and  Eq.  12  together  define  the  probabilities  of  each  model  and  each  gas  occurring  respec¬ 
tively.  In  BMA  the  probability  for  each  gas  to  be  present  in  Eq.  12  is  then  thresholded  to  obtain  a 
list  of  gases  present  in  each  pixel. 


5  Performance  Evaluation  of  Identification  Algorithms 

To  evaluate  and  compare  different  plume  detection  and  identification  systems,  ground  truth  for  the 
dataset  is  a  necessity.  In  data  with  real  plumes,  the  spatial  extent  of  the  plume,  the  constituent 
gases,  the  concentrations,  and  temperature  of  the  plume  are  typically  unknown.  To  obtain  a  perfor¬ 
mance  estimate  with  known  plume  parameters,  a  synthetic  plume  embedding  technique  was  used. 
The  embedding  technique  is  based  on  Eq.  1  and  requires  a  background-only  cube  and  a  signature 
library.15,21 

We  selected  a  background  cube  and  used  the  plume  embedding  algorithm  to  produce  synthetic 
plumes  with  known  ground  truth  at  specified  concentration  pathlength  (CL)  values.  We  used  a 
library  of  eight  gas  signatures  based  on  the  spectral  library  described  in.22  At  least  three  of  the 
gases  in  the  library  have  strong  spectral  features  in  the  same  wavelength  region;  two  of  these  gases 
were  selected  for  embedding.  The  signatures  were  normalized  to  their  maxima  prior  to  embedding, 
thus  the  CL  values  reported  can  be  used  to  infer  the  approximate  thickness  of  the  plume  from  Eq.  1 
and  Eq.  2.  The  data  had  128  channels  with  centers  ranging  from  7.6  gm  to  13.5  //m;  most  of 
the  signatures  had  appreciable  absorption  peaks  over  this  range  of  wavelengths.  The  plume  was 
embedded  over  a  ground  portion  of  the  image  where  the  embedding  model  is  most  appropriate. 
Based  on  an  estimate  of  the  background  temperature,  the  plume  was  simulated  to  be  about  10K 
colder  than  the  background;  the  temperature  difference  made  the  plume  easily  detected. 

To  compare  the  ACE  and  BMA  algorithms  fairly,  the  embedding  region  was  excluded  from 
background  mean  and  covariance  estimates,  which  are  substituted  into  Eq.  5.  The  inclusion  of 
the  plume  in  these  estimates  can  lead  to  substantial  performance  degradation.  Both  techniques 
processed  the  entire  cube  and  produced  scores  for  each  pixel  and  each  gas.  BMA  considered 
mixtures  of  up  to  three  gases. 

In  the  following  sections  we  present  results  for  several  experiments  using  the  same  embedding 
region  for  several  embedding  scenarios.  In  Sections  5.1  and  5.2,  we  consider  a  single  chemical 
at  a  single  CL  of  0.027.  and  examine  the  distribution  of  ACE  and  BMA  scores  with  respect  to 
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threshold.  In  Sections  5.3  and  5.4,  we  examine  system  performance  over  a  range  of  CLs  and 
embed  both  a  single  gas,  and  two  gases  in  the  same  plume  region.  In  Section  5.4,  we  propose  a 
cascaded  system  and  examine  performance  for  both  the  single-gas  and  two-gas  embeddings  over 
the  same  CL  ranges  as  in  the  other  sections. 

5.1  Detection  Performance 

Our  analysis  presents  several  histograms  that  were  created  as  follows.  For  a  background  pixel, 
if  any  score  exceeds  the  threshold,  the  result  is  a  false  alarm;  therefore,  if  the  maximum  score 
among  all  the  gas  scores  exceeds  the  threshold,  it  is  a  false  alarm.  For  a  plume  pixel,  there  is  one 
gas  that  is  the  correct  gas  embedded,  the  other  gases  are  incorrect  or  wrong.  If  any  other  gas  is 
above  the  threshold,  then  the  identifier  has  made  a  mistake;  if  the  maximum  score  among  incorrect 
gases  is  above  the  threshold,  then  a  mistake  was  made.  The  histograms  we  present  in  the  following 
sections  used  only  these  maxima.  The  goal  is  to  illustrate  how  important  the  threshold  is  to  the 
performance  of  the  system,  and  to  highlight  the  difference  between  good  detection  performance 
and  good  identification  performance. 

For  perfect  detection,  there  should  be  perfect  separation  between  background  and  plume  scores. 
As  shown  in  Fig.  5,  ACE  separates  the  background  (red)  and  plume  (green)  quite  well.  Compara¬ 
tively,  BMA  does  not  separate  the  background  and  plume  as  well.  In  Fig.  5a  the  red  histogram  is 
composed  of  the  maximum  scores  for  each  background  pixel  since  only  the  maximum  score  needs 
to  exceed  the  threshold  for  the  pixel  to  be  a  false  alarm.  The  background  pixel  scores  are  distributed 
close  to  0,  with  very  few  exceeding  a  value  of  0. 1 .  The  green  histogram  contains  the  scores  for 
the  correct  gas  only,  and  consists  only  of  the  pixels  within  the  embedded  plume  region;  the  plume 
pixel  scores  are  distributed  away  from  0  with  very  little  overlap  between  the  background  and  plume 
scores.  The  small  overlap  of  the  green  and  red  histograms  leads  to  the  ACE  bank  achieving  a  high 
correct  detection  rate  for  a  large  range  of  false  alarm  rates,  as  shown  in  Fig.  6  in  red. 

In  Fig.  5b  the  background  scores  (red)  of  BMA  are  not  tightly  distributed  near  0,  and  span  the 
full  range  of  thresholds.  There  is  significantly  more  overlap  between  the  background  and  plume 
scores  (green)  of  BMA  than  for  the  detector  bank.  The  overlap  between  the  distributions  means 
that  for  any  threshold,  the  number  of  false  alarms  and  missed  detections  will  be  higher  for  BMA 
than  for  the  ACE  bank,  as  shown  in  Fig.  6  in  blue. 

The  curves  in  Fig.  6  were  constructed  by  sweeping  a  range  of  thresholds  to  produce  a  series  of 
confusion  matrices  from  which  the  performance  metrics  are  computed.  At  any  particular  threshold, 
a  single  realization  of  the  confusion  matrix  is  obtained.  At  each  threshold  PD  and  PFA  were 
estimated  using  the  correct  detection  rate  and  false  alarm  rate  with  the  partitions  and  weights  of 
Table  2.  From  Fig.  6  the  probability  of  detection  for  BMA  is  lower  than  ACE  for  any  PFA.  Since 
the  ROC  curve  of  the  detector  bank  is  above  the  one  for  BMA,  the  detector  bank  is  a  better  detector 
for  this  dataset. 

5.2  Identification  Performance 

Detection  performance  was  measured  using  the  background  scores  for  each  gas  and  the  plume 
scores  for  the  correct  gas  only.  The  outputs  for  the  incorrect  gases  were  neglected  when  consid¬ 
ering  the  plume  pixels,  but  for  identification,  the  scores  for  the  incorrect  gases  determine  whether 
we  made  a  correct  identification  or  not.  Since  a  single  threshold  is  applied  to  each  output  of  the 
identifier,  scores  for  chemicals  that  are  not  actually  present  may  exceed  the  threshold.  Multiple 
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ACE 


BMA 


(a)  (b) 

Fig  5:  Scaled  histograms  of  representative  outputs  for  (a)  ACE  and  (b)  BMA  using  embedded  data. 
Background  and  plume  pixels  are  easily  separated  using  ACE,  but  less  so  by  BMA. 


PD  vs.  PFA  (ROC  curve) 


Fig  6:  Receiver  operating  characteristic  (ROC)  curve  for  ACE  and  BMA  based  on  the  scores  for 
the  correct  gas  and  for  the  background.  The  ROC  curves  can  be  constructed  from  the  histograms 
in  Fig.  5. 

thresholds  could  be  used,  but  selecting  a  threshold  for  each  chemical  individually  is  not  practical 
without  prior  knowledge  of  which  chemicals  will  be  present.  To  have  good  identification  perfor¬ 
mance,  the  distributions  of  scores  for  the  correct  library  gas  and  the  incorrect  library  gases  should 
be  well  separated. 

In  Fig.  8,  the  same  histograms  as  in  the  previous  section  are  presented,  but  the  scores  for  the 
incorrect  gases  are  also  included  (blue).  For  each  pixel  in  the  plume,  the  maximum  score  among  the 
incorrect  gases  was  used  to  construct  the  histogram;  since  a  single  threshold  is  used  to  determine 
which  gases  were  present,  if  the  maximum  exceeded  the  threshold,  then  an  incorrect  or  partial 
identification  occurred.  If  only  the  correct  gas  passes  the  threshold  then  a  correct  identification 
occurred.  In  Fig.  8a,  the  ACE  detector  bank  shows  good  separation  between  the  background  and 
the  plume  for  both  the  correct  and  incorrect  gases;  however,  the  correct  gas  and  maximum  incorrect 
gas  histograms  have  significant  overlap.  In  this  case  the  ACE  bank  does  a  poor  job  of  identifying 
exactly  which  gas  is  present  when  compared  to  BMA’s  results  in  Fig.  8b. 

The  identification  performance  curves  shown  in  Fig.  9  were  created  by  selecting  a  range  of 
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thresholds,  computing  a  confusion  matrix  for  each  threshold,  and  then  using  the  partitioning  and 
weighting  scheme  described  in  Section  3  with  the  Dice  weighting  of  Eq.  9.  The  resulting  per¬ 
formance  curves  demonstrate  that  ACE  achieves  a  lower  maximum  than  BMA;  BMA  has  higher 
identification  performance  over  a  wide  range  of  thresholds,  achieving  its  maximum  near  a  thresh¬ 
old  of  0.5.  The  performance  curves  of  Fig.  9  reflect  approximately  how  well  the  blue  and  green 
histograms  of  Fig.  8  are  separated.  Overall,  the  performance  curves  indicate  that  BMA  selects  the 
correct  gas  more  often  than  ACE  for  a  wide  range  of  thresholds. 

We  interpret  these  results  as  follows.  With  the  chosen  embedding  parameters,  several  gases 
have  similar  ACE  scores  since  each  ACE  detector  considers  only  a  single  gas.  The  embedded 
chemical’s  signature  is  similar  to  two  of  the  other  library  signatures,  leading  to  multiple  gases 
having  similar  ACE  scores  as  indicated  by  the  green  and  blue  histograms  of  Fig.  8a;  poor  separation 
of  different  gases  leads  to  poor  identification  performance  when  compared  to  BMA,  as  shown  in 
Fig.  9.  However,  the  ACE  scores  for  the  plume  pixels  are  higher  than  for  the  background  pixels, 
leading  to  good  detection  performance  as  indicated  by  the  ROC  curve  in  Fig.  6,  and  the  green  and 
red  histograms  of  Fig.  5.  In  contrast,  BMA  achieves  better  separation  of  the  correct  and  incorrect 
gases  leading  to  higher  identification  performance  for  a  wide  range  of  thresholds. 

The  identification  performance  curve  has  an  unusual  characteristic  compared  to  detection  met¬ 
rics,  which  monotonically  increase  or  decrease  with  threshold.  The  identification  performance 
curve  is  the  average  Dice  score  for  the  pixels  within  the  plume  at  a  number  of  thresholds.  Per¬ 
formance  initially  increases  with  respect  to  threshold,  reaches  a  maximum,  and  then  decreases  to 
zero  at  the  maximum  threshold.  For  small  thresholds,  several  chemicals  pass  the  threshold  for 
a  large  portion  of  the  plume  pixels.  In  the  low  threshold  regions,  many  of  the  plume  pixels  are 
partial  identifications,  while  for  high  thresholds  there  are  more  misses  as  in  the  confusion  matrices 
of  Fig.  4c.  The  number  of  chemicals  in  the  output  of  each  pixel  determines  its  Dice  weight.  For 
a  library  of  8  gases  with  1  chemical  embedded  and  8  chemicals  in  the  output,  the  weight  is  about 
0.22,  which  is  approximately  the  score  BMA  achieves  at  the  lowest  thresholds.  For  high  thresh¬ 
olds,  many  plume  pixels  do  not  pass  the  threshold  at  all,  leading  to  many  pixels  with  a  score  of  0, 
causing  the  performance  curve  to  deteriorate.  Performance  near  one  indicates  that  the  majority  of 
plume  pixels  have  been  correctly  identified,  i.e.  all  of  the  chemicals  actually  present  are  correctly 
identified  and  there  are  no  extras. 

Since  the  maximum  identification  performance  of  BMA  is  higher  than  ACE  in  this  case,  BMA 
can  perform  better  as  an  identifier  given  an  appropriate  threshold.  Selecting  a  threshold  is  usually 
accomplished  by  assuming  that  each  threshold  produces  constant  false  alarm  rate  (CFAR).  While 
it  is  possible  in  principle  to  select  an  operating  point  in  this  fashion,  BMA  has  a  much  higher  false 
alarm  rate  than  ACE,  as  shown  in  Fig.  7. 

Although  BMA  has  a  higher  maximum  performance  than  the  ACE  detector  bank,  the  improved 
identification  performance  of  BMA  also  comes  with  the  undesirable  higher  false  alarm  rate.  Chain¬ 
ing  the  two  algorithms  to  have  the  detector  followed  by  the  identifier  is  an  obvious  way  to  have  a 
lower  false  alarm  rate  while  having  the  superior  identification  performance  of  BMA,  as  we  discuss 
in  Section  5.4. 

5.3  Effects  of  Plume  Thickness  on  Performance 

The  plume’s  thickness  (CF)  and  temperature  are  the  major  drivers  for  how  easily  detected  and 
identified  the  plume  is.  In  the  previous  sections  a  single  CF  was  used  for  embedding;  in  this  section, 
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Fig  7:  False  alarm  rates  of  the  two  identifiers  vs.  threshold. 


Fig  8:  Scaled  histogram  of  representative  outputs  for  (a)  ACE  and  (b)  BMA  using  embedded  data. 
Incorrect  gases  are  not  easily  separated  from  the  correct  gas  using  ACE,  and  are  better  separated 
by  BMA. 

we  examine  identification  performance  for  a  range  of  CLs.  We  focus  on  the  effects  of  CL  instead 
of  temperature  because  the  measured  signal  is  approximately  linear  in  terms  of  the  temperature 
contrast,  as  in  Eq.  3.  However,  the  non-linear  relation  between  plume  thickness  and  the  measured 
signal  is  one  of  the  reasons  chemical  identification  is  challenging.  The  same  background  cube  and 
embedding  region  as  in  Section  5.1  were  used  for  this  section.  First,  a  single  gas  was  embedded  for 
a  range  of  CLs  and  both  algorithms  run  on  the  data.  For  consistency,  the  same  gas  as  Section  5.2 
was  used.  Second,  a  mixture  of  two  gases  was  embedded  and  the  experiment  repeated.  The  second 
gas  used  in  the  mixture  was  one  that  was  spectrally  similar  to  the  first  one.  In  both  cases,  the 
background  statistics  were  the  same  for  each  CL.  Consequently,  at  any  particular  threshold,  the 
false  alarm  rate  for  each  algorithm  remained  the  same  for  all  CLs. 

Since  we  expected  identification  performance  to  deteriorate  for  sufficiently  thick  plumes,  em¬ 
bedding  was  done  for  two  different  ranges  of  CLs.  The  first  range  simulated  very  thin  plumes, 
while  the  second  range  was  chosen  to  show  identification  performance  reduction  with  thick  plumes. 
Fig.  10a  and  Fig.  10c  have  maximum  CLs  of  0.1,  which  corresponds  to  max(as(A))  =  0.1  in 
Beer’s  law  of  Eq.  2  and  the  measurement  equation  Eq.  1.  This  corresponds  to  a  minimum  trans¬ 
mission  of  min(Yp)  =  e-01  Therefore,  the  minimum  transmission  for  the  plume  is  about  90% 
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ID  Performance  vs.  Threshold 


Fig  9:  Identification  performance  for  ACE  and  BMA  over  a  range  of  thresholds. 

in  this  region.  The  change  in  signal  is  approximately  linear  with  respect  CL  in  for  the  small  CL 
region;  however,  in  Fig.  10b  and  Fig.  lOd,  a  larger  range  of  CLs  is  shown.  At  a  scaled  CL  of  10, 
the  plume  is  almost  completely  opaque  (tp  ~  0)  at  its  maximum  absorption  channel. 

For  the  embedded  plume  with  a  single  gas,  the  identification  performance  for  both  BMA  and 
ACE  for  a  range  of  CLs  is  shown  in  Fig.  10a.  Performance  is  plotted  with  respect  to  CL  for  thresh¬ 
olds  from  0.1  to  0.99  for  BMA  and  from  0.1  to  0.9  for  ACE.  For  most  of  the  selected  thresholds, 
BMA’s  performance  increases  with  CL  and  is  generally  higher  than  ACE’s  performance  for  low 
CLs.  After  peak  performance,  BMA’s  performance  slowly  decreases,  while  ACE  reaches  a  peak 
quickly  and  then  decreases  quickly.  As  the  plume  gets  even  thicker,  as  in  Fig.  10b,  the  performance 
of  BMA  drops  off  from  the  maximum  but  ACE’s  performance  has  several  peaks,  depending  on  the 
threshold.  The  plume  becomes  too  thick  for  BMA  to  distinguish  individual  gases  and  performance 
degrades  substantially.  However,  with  a  very  high  ACE  threshold,  decent  performance  for  very 
thick  plumes  can  be  achieved.  Using  ACE,  it  is  possible  to  set  a  high  enough  threshold  to  separate 
the  correct  gas  from  the  others  when  the  plume  is  very  thick.  However,  this  sacrifices  perfor¬ 
mance  at  small  CLs,  which  is  generally  where  standoff  systems  are  expected  to  operate.  For  both 
algorithms,  the  general  trend  is  that  as  CL  increases  identification  performance  increases,  then 
reaches  a  peak,  and  then  begins  to  degrade  before  completely  failing.  The  poor  performance  of 
both  algorithms  for  very  thick  plumes  indicates  that  multiple  gases  have  similar  scores  and  cannot 
be  separated  by  a  single  threshold,  or  that  incorrect  gases  are  being  identified.  When  the  plume 
becomes  very  thick,  both  techniques  fail  in  identifying  the  plume  and  should  not  be  used. 

To  test  performance  for  mixtures,  two  gases  were  embedded  in  the  same  location  as  the  pre¬ 
vious  experiment.  The  CLs  of  both  gases  were  varied.  In  Fig.  10c,  the  performance  of  both 
algorithms  is  shown  for  a  range  of  CLs.  The  trends  are  similar  to  the  previous  experiment  except 
that  performance  of  the  detector  bank  is  uniformly  worse  than  BMA.  Again,  the  problem  ACE  has 
in  this  case  is  that  multiple  gases  have  similar  scores,  and  that  the  correct  mixture  of  two  gases  is 
not  considered  by  ACE.  BMA  gives  both  correct  gases  high  scores  because  the  model  containing 
the  mixture  has  a  relatively  high  probability.  The  result  is  that  BMA  performs  well  for  this  mixture 
relative  to  ACE.  In  Fig.  lOd,  a  similar  degradation  in  identification  performance  is  seen  as  with  the 
single  gas  embedding.  However,  as  compared  to  the  single  gas  embedding,  BMA  performs  better 
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for  the  mixture  over  a  wider  range  of  CLs  than  ACE. 


(a)  (c) 


(b)  (d) 

Fig  10:  Identification  performance  of  ACE  and  BMA  with  an  plume  at  various  concentration 
pathlengths  (CLs)  and  thresholds,  (a-b)  Single  gas  embedded  in  cube,  (c-d)  A  mixture  of  two 
gases  embedded  in  cube.  Thresholds  were  uniformly  spaced  between  0.1  and  0.9  for  ACE,  and  0.1 
and  0.99  for  BMA. 


5.4  Detection  Followed  by  Identification 

The  detection  performance  of  the  ACE  filter  bank  is  relatively  high  compared  to  BMA,  but  the 
identification  performance  of  ACE  is  lower  than  BMA.  We  argue  that  combining  the  two  algo¬ 
rithms  in  cascade  leads  to  a  system  with  superior  performance  characteristics  compared  to  either 
algorithm  individually.  The  cascaded  system  uses  the  ACE  detector  bank  as  a  first  pass  and  then 
passes  only  the  hits  to  the  BMA  identifier  and  is  shown  in  Fig.  11.  Using  the  ACE  bank  as  a 
first  pass  for  the  data  can  yield  a  high  plume  detection  rate  at  a  low  false  alarm  rate.  Each  pixel 
that  passes  the  threshold  is  then  passed  to  BMA  for  identification,  which  makes  a  final  identifica¬ 
tion  decision  about  those  pixels.  In  this  section,  the  cascaded  system  is  evaluated  using  the  same 
embedding  scenarios  as  before. 
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Detection  Identification 

Threshold  Threshold 


Only  pixels  that  pass  the 
detection  threshold  are 
processed  by  the  identifier 

Fig  11:  The  cascaded  system  design. 

The  cascaded  system  was  run  on  the  embedded  data  using  pairs  of  ACE  and  BMA  thresholds. 
The  number  of  false  alarms  for  a  particular  threshold  is  constant  with  respect  to  CL,  and  is  de¬ 
termined  primarily  by  the  ACE  threshold.  We  selected  two  ACE  thresholds:  0.1  and  0.36.  The 
corresponding  PFAs  for  both  ACE  and  the  cascaded  system  are  3  x  10-3  and  zero;  the  second 
threshold  is  high  enough  so  that  no  background  pixels  pass  the  ACE  threshold.  The  system  was 
tested  at  10  evenly  spaced  BMA  thresholds  ranging  from  0.1  to  0.99.  The  PFAs  for  BMA  alone  at 
these  thresholds  are  0.002  at  the  highest  threshold  and  0.99  at  the  lowest  threshold. 

The  resulting  identification  performance  curves  for  the  single-gas  embedding  are  shown  for  the 
low  ACE  threshold  in  Fig.  12a  and  for  the  higher  threshold  in  Fig.  12b.  The  results  when  two  gases 
were  embedded  are  shown  in  Fig.  12c  and  Fig.  12d  for  the  low  and  high  ACE  thresholds.  The  blue 
dashed  curve  shows  the  performance  of  the  ACE  detector  bank  alone;  the  solid  green  curves  show 
BMA’s  performance  alone;  the  red  dotted  curves  are  the  cascaded  system’s  performance. 

The  cost  of  cascading  the  two  algorithms  compared  to  ACE  alone  is  generally  worse  identifi¬ 
cation  performance  at  low  CLs,  which  is  more  pronounced  in  Fig.  12d.  However,  at  higher  CLs, 
the  cascaded  system  achieves  better  performance  than  the  ACE  bank  for  most  choices  of  BMA 
threshold.  Selecting  the  lowest  BMA  threshold  of  0.1  actually  results  in  worse  performance  than 
the  ACE  system  alone  for  the  single-gas  embedding.  Selecting  the  lower  ACE  threshold  leads  to 
a  smaller  difference  between  the  green  and  red  curves  at  smaller  CLs  but  leads  to  a  higher  false 
alarm  rate.  In  practice,  the  maximum  operational  false  alarm  rate  will  dictate  what  ACE  threshold 
to  select.  However,  it  is  unclear  how  best  to  set  a  BMA  threshold  in  the  cascaded  system.  From  our 
experiments,  the  trends  show  that  low  thresholds  lead  to  higher  identification  performance  when 
the  plume  is  very  thin,  but  become  comparatively  worse  as  the  plume  thickens. 

The  results  in  Fig.  12d  are  a  case  where  the  combined  system  does  substantially  better  than 
ACE  alone  because  the  highest  ACE  scores  occur  for  a  gas  that  is  not  present  in  the  plume.  In  this 
case,  the  plume  passes  the  threshold  when  the  plume  is  sufficiently  thick,  but  the  correct  gases  are 
not  detected  or  identified  until  the  identifier.  Instead,  the  mixture  is  incorrectly  identified  by  the 
ACE  detector  as  a  completely  different  gas  in  the  library.  The  detector  is  still  performing  the  vital 
role  of  reducing  the  overall  false  alarm  rate  since  using  BMA  alone  has  a  substantially  higher  false 
alarm  rate,  as  illustrated  in  Fig  7.  The  second  pass  by  BMA  correctly  identifies  the  mixture  more 
often  and  substantially  improves  identification  performance.  However,  using  ACE  at  the  higher 
0.36  threshold  leads  to  degraded  identification  performance  for  thinner  plumes. 

Since  the  probability  of  false  alarm  for  the  identifier  is  substantially  higher  than  the  detector 
bank,  the  overall  PFA  should  be  set  using  the  detector  bank.  It  is  tempting  to  set  a  threshold  for  the 
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identifier  that  maximizes  identification  performance;  however,  the  maximum  depends  on  the  prop¬ 
erties  of  the  plume,  which  are  known  for  synthetic  data,  but  are  unknown  in  real  data.  Even  with 
good  plume  models,  it  is  impractical  to  try  and  find  a  good  threshold  over  all  possible  simulated 
scenarios.  Based  on  our  results,  BMA  thresholds  greater  than  0.5  showed  decent  performance  over 
a  range  of  CLs.  The  identification  performance  of  the  system  is  somewhat  insensitive  to  the  BMA 
threshold  except  at  the  extremes  close  to  0  and  close  to  1 .  How  to  select  the  identification  threshold 
is  an  aspect  for  future  work. 


(a)  (c) 


(b)  (d) 

Fig  12:  Performance  of  the  cascaded  system  for  BMA  thresholds  from  0.1  to  0.99.  (a-b)  Perfor¬ 
mance  for  the  single-gas  embedded  data  with  ACE  thresholds  of  0.1  and  0.36  respectively,  (c-d) 
Performance  using  the  two-gas  embedded  data. 


6  Conclusions  and  Future  Work 

The  two  main  contributions  of  this  work  are  the  development  of  a  performance  metric  for  the 
evaluation  of  chemical  plume  detection  and  identification  algorithms,  and  a  demonstration  that  a 
detector  followed  by  an  identifier  yields  superior  performance  compared  to  using  either  alone.  The 
approach  to  performance  evaluation  using  a  weighted  confusion  matrix  and  performance  evalu¬ 
ation  using  the  Dice  metric  are  novel  in  this  area  of  remote  sensing.  We  applied  our  metric  to 
quantitatively  demonstrate  that  a  cascaded  detector  and  identifier  can,  attain  a  high  PD  and  low 
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PFA,  while  also  achieving  a  high  ID  score.  Each  constituent  algorithm,  by  contrast,  can  achieve 
only  one  of  these  goals. 

In  the  future,  other  types  of  algorithms  should  be  evaluated,  and  the  number  of  datasets  ex¬ 
panded.  Our  study  here  is  not  exhaustive,  but  provides  a  framework  place  for  further  investigations. 
In  this  work,  a  single  dataset  with  a  single  synthetic  plume  was  used,  but  a  range  of  concentrations 
and  chemical  mixtures  should  be  incorporated  into  future  work.  As  discussed  previously,  a  study 
of  all  possible  combinations  of  parameters  is  impractical;  instead,  smart  experimental  design  can 
indicate  what  the  overall  trends  are.  In  particular,  a  wider  range  of  parameters  and  background 
clutter  data  will  give  insight  into  the  problem  of  threshold  selection  for  a  cascaded  system. 
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